A Chinese text classification model based on radicals and character distinctions
نویسندگان
چکیده
Chinese characters are generally correlated with their semantic meanings, and the structure of radicals, in particular, can be a clear indication how related to each other. In simplification movement, some different traditional have been transferred into one simplified character (many-to-one mapping), resulting phenomenon ’one corresponding many characters. Compared characters, contain richer structural information, which is also more meaningful understanding. Traditional approaches text modelling often overlook content role human cognitive behaviour process comprehension. Hence, we propose classification model derived from construction methods evolution The consists two branches: traditional, an attention module based on radical branch. Specifically, first develop sequential obtain sequence information texts. Afterwards, associated word using part head as medium designed filter out keywords high differentiation among auxiliary units. An then implemented balance importance keyword particular context. Our proposed method conducted three datasets demonstrate validity plausibility.
منابع مشابه
Text Content Filtering Based on Chinese Character Reconstruction from Radicals
Content filtering through keyword matching is widely adopted in network censoring, and proven to be successful. However, a technique to bypass this kind of censorship by decomposing Chinese characters appears recently. Chinese characters are combinations of radicals, and splitting characters into radicals pose a big obstacle to keyword filtering. To tackle this challenge, we proposed the first ...
متن کاملFeature Selection on Chinese Text Classification Using Character N-Grams
In this paper, we perform Chinese text classification using n-gram text representation on TanCorp which is a new large corpus special for Chinese text classification more than 14,000 texts divided into 12 classes. We use different n-gram feature (1-, 2-grams or 1-, 2-, 3-grams) to represent documents. Different feature weights (absolute text frequency, relative text frequency, absolute n-gram f...
متن کاملChinese Character Classification Based on Rough Set and SVM Algorithm1
In the paper, we present a integrated approach combined Rough Set theory and SVM algorithm. The approach udl be divided into two steps. The fust step is classified roughlv with Rough Set, rule should be induced in this step by infonilation system. The second step should ht: classified precisely based on SVM Algorithn~, in this step we present two new fiuidrunental principles to help us select b...
متن کاملmortality forecasting based on lee-carter model
over the past decades a number of approaches have been applied for forecasting mortality. in 1992, a new method for long-run forecast of the level and age pattern of mortality was published by lee and carter. this method was welcomed by many authors so it was extended through a wider class of generalized, parametric and nonlinear model. this model represents one of the most influential recent d...
15 صفحه اولA Character-Net Based Chinese Text Segmentation Method
The segmentation of Chinese texts is a key process in Chinese information processing. The difficulties in segmentation are the process of ambiguous character string and unknown Chinese words. In order to obtain the correct result, the first is identification of all possible candidates of Chinese words in a text. In this paper, a data structure Chinese-character-net is put forward, then, based o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3257339